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ABSTRACT 


This paper introduces a class of finite-memory determin- 
istic algorithms for the following problem of hypotheses test- 
ing under a finite memory constraint. Let X1 2X5 .Xz,.-. be a 
sequence of independent, identically distributed Bernoulli 
random variables where X. can take on values H or T. The 
problem is to decide between the two simple hypothesis 
: ia P(X, =H) —wo Vs, © : P(X, =H) = q, where P(#H is true) = TO 
= P(T is true) = T™), = 4s, The X.'s are observed sequentially 
and a new decision must be formulated after each observation. 
Let the data be summarized after each new observation by a 
2n-valued statistic VeS = ieee een (ied), pom ae 
which is updated according to the rule Vi = ECV, 2X); where 
me omen, i j>5; is the transition function. Let the decision 
rule take action His true if VE Cy 2 2p eee MH) 

d(V,) = 
T is true if V,e(n, n-1i se. 92,51.) 
at time k. The objective is to find the function f which 
minimizes the probability of error P(e) = it CGE Ties 
Buwe) + 1, P(d = Pee ket Ete!) 

tiiewalcoruthm may be taught of as a finite state automaton, 
on Bch the inputs are the observations, the outputs are the 
decisions, and the states constitute the memory. In this paper, 


the optimal algorithms are found for a small number of states 


Cap to 20). 
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I. INTRODUCTION 


In the near future, computers will play even more im- 
portant roles in every day life than ever. We are expecting 
more compact machines with more working capabilities and 
requiring less human intervention. The idea of designing 
a machine with decision making capability is of great inter- 
est. If such a machine were to be employed in a decision 
process, senenewe Lace the problem of trading off the error 
caused by the machine operation against its complexity. In 
a smaller size machine with limited core memory, the error 
caused by the decision under finite memory iS even more 
Significant. The attempt of designing a small machine with 
limited core size which operates with least probability of 
error 1s worth of time and effort, especially if such a 
device is to be used for some special tasks. Consider an 
exploration of a distant star with an unmaned spacecraft. 

If a computer with a decision making capability were to be 
used within the spacecraft, it would almost certainly have 
to be small and of limited core size. This kind of machine 
or automaton would be required to make decision with minimum 
probability of error constrained by the available memory. 

In this example, the machine or automaton acts like an 
i7pee=OUtput machine which has the observed data as the in- 
Mie nouLput Of the machine are the decisions. Only a 


Prec nwamount Of intormations can be stored at any time. To 





construct such an automaton, a form of a decision process 
could probably be adapted from the statistical test of 
hypotheses. 

Let Xi sX5>- 


distributed random observations drawn according to a proba- 


be a sequence of independent, identically 


bility measure YF defined on arbitrary probability space. 


Consider the simple hypotheses testing problem 


Fea? = VS. H.: “FP = 2 


Let the prior probabilities of the null and alternative 
hypotheses be denoted by he and 4 VEspecti vely einem ae al 
is to find a sequence of decision rules d, (XJ 5d, (X41 >X5);- 


which minimize the asymptotic probability of error 


_ easnieeed Nn 
SO) = = n if] e| 


ie at d. *# H true, 
e. = (1) 
7 0 if d, = H true. 


where 


Pewotane, tor a sample of size n, a the probability of 
type I error and BO the probability of type II error it is 
well known that O and BS will exponentially approach zero 


as n becomes large. However, the decision d_ depends on 


(X, X55--- ok); so that, as n increases, the amount of data 
to be stored increases without bounds. Some means of data 
reduction may therefore be desirable. Sufficient statistics 


Salmoometimes be used to reduce the required size of memory. 


These statistics lose no information when used. Unfortunately, 
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the following example shows that this type of data reduction 
is sometimes misleading. 


Let X ~ N(y,1) with the hypotheses 


Hy: up = 1 VS. Hy: up = -l] 
where 
= = | 
Ty 4 as 
Cea ees ; : : : n 
A sufficient statistic for this test is ee = i2y Xs, 


where es 1s the value of the statistic after n observations. 


A simple optimal decision scheme is 


H oo a 8) 
0 n- 


H aie ee.) 
1 n 


The updating rule for Yn 1S given by 


el - Me i Shei 


Thus Yn contains all the desired informations about 
(X, X55--+ XL), and only Mo ieeq nome wee On decedent ome 
ie WHOWC VEL , se iereal=valucd so that a potentially infinite 
BuouatemiceMeedcdetorn Jtmalone. If the memory can store a 
real number then it can store any number of real numbers so 
ttieemo saving 12S actwally achieved. The attempt of round- 
Miemert the Suificient statistic to some finite number of 
digits need not be optimal. In fact, Cover [2] has shown 
Eiameene error probability need not tend to zero if rounded 
Peace rstie 15S used. If constrained by finite memory, 
some other statistic must be devised in cooperation with 


Soemaipropriate decision rule. Before further discussion 
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Of this topic, it would be worthwhile to look at the liter- 
ature survey concerning a decision model constrained by 
finite memory. The first mention of finite memory in con- 
nection with a statistical decision problem is to be found 
in [1]. This paper by Robbins discussed the problem of 
choosing one of two ways of action, each of which — lead 
to success or failure, in such a way as to maximize the 
long-run proportion of successes obtained. The choice each 
time is allowed to depend only on a finite number of past 
observation. Suppose we have two coins and we wish to 
maximize the number of heads thrown during a sequence of 
tosses. If we had prior knowledge of which coin has the 
larger probability of turning up a head, we should use it 
SxeluslVely winrespeerivesor the OULCOMeSs Of DPEEVIOls) tosses: 


Then, with probability 1 


lim number of heads in first n tosses _ 
a (P,>P>)> 
where 


ie probability of head for the ie coin. 


If no information about Py and P» TSeavallable ; einen 
there still is some decision rule which asymptotically 
achieves the above limit. Robbins had come up with the fol- 
lowing decision rule, said to Os: type r if the decision 
as to which coin will be used for the os toss depends only 
on the results of tosses n-r,n-r+l,...,n-1. "Define the 
Gude Ae ASeeOmbovs... Staltmeesstigeiemecoin 1, stop if the 


Preeross is a tail, otherwise continue tossing until the 


WZ 





Perera OL -f CONSecutive tails occurs and then stop. This 
defines the first block of tosses with coin 1. Now start 
tossing with coin 2 and apply the same rule, obtaining the 
first block of tosses with coin 2. Then start again with 
coin 1 and apply the same rule, obtaining the second block 

of tosses with coin 1 and so on indefinitely, thus generating 
an infinite sequence of tosses consisting of alternate blocks 
of tosses with coin 1 and 2.""' With rule Re of type r so 
defined Robbins proved that with probability 1 


Yr 
lam number of@ieads une first netosses _ Pike 2! io) A 1D 
00 
. . Gee) Giie 


and note that 


rim 
noo 


DaGlch) ep mGher 
aa a Dea aX (P1>P>)- 
(1-p,) + (1-p,) 

So the rule R is the best among the other of type r 
which maximize the long-run proportion of heads obtained. 

Although Robbins in his paper did not consider hypotheses 
testing problem, the paper nevertheless stimulated the idea 
of using a finite memory in statistical decision. By using 
the same m-finite past memory as in [1], Cover [2] developed 
a decision rule using a 4 states memory alogrithm for the 
hypotheses testing problem as applied to Bernoulli trials. 
Gresdetails of this can be found in [2]. 

The first actual finite state memory algorithm was pro- 
pescd=ine|opeand {4| by HelJmanvand Cover: Here thespast 


observations were stored as a state of a machine constrained 
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to have only m states. Consider the two hypotheses testing 


problem as stated earlier, where 
iy Price alee VS eae clei Le 


and the prior probabilities P (Hy is true) = TS and PCH, 1s 
rie) = T)- 

After each observation, let the data be summarized by 
an m-valued statistic Ve{1,2,...,m} where the updating 


scheme is V, = £(V eo) (eel ea eon Cc Gl aii 


6 eae Ce 
where X.eW the set of values which X. can take on for 
_—— i 2,... and &£ 15 the transition function. 

Let the decision rule be defined as 

d_ = d(V_); ele 2meees Mh o> {Hy ,H,}. 

The pair (f£,d) then describes a finite-state automaton with 
inputs x, and outputs d, = d(V_) and state space S = alle. 
...-,m}. Here V,, is the state at time n. Under hypotheses 
Ho Or Hi» the sequence Vy Wilthessome  speeli1c o1nitial is fate. 


forms a Markov chain over the state space S. In order to 


Minimize 


a lim Jj] 2 
ees 5 | A A ifd es | 
where e. is the error as defined in (1), we need to find the 
optimal pair (f,d). Hellman and Cover have established a 


lower bound for P(e) as follows. 


fener wanget. be the probability densities of the sample 


0 1 
under the respective hypotheses with respect to a dominating 
£27) 
: . : : 0 
measure. Define the likelihood ratio to be &(x) = ~ . 
FiO) 
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Let 2 denote the essential supremum of 2(x) and & the es- 
sential infimum of &(x) where the supremum and infimum are 
taken over all measurable sets with positive dominating 
measure. Define y = 2/2. Then for an irreducible? n- 
State automaton, we have P(e) > P* where 

ia 


Z 7 OA™1Y il 


97 Tl 


m-1 ro ain 


Y - jl 

It was further proved in [3] that the same bound P* also 
hold for P(e) in a reducible (mtl)-state automaton. For a 
special case where ea = meeeene P* "for the 1rreducible 


m-state automaton becomes 


If the m-state automaton is reducible, the bound becomes 


1 
p* = : 
%(m-2), 1 


_Y 

The ratio y = 2/2 is a measure of the separation between 
the two hypotheses. Notice that P* decreases exponentially 
with m. 

However, Hellman and Cover have shown that, except for 
degenerate cases, no machine can actually achieve the bound 
Po ee pUecudcmane example an Bernoulli case, an €-optimal class 
of automata (i.e., such that for any € > 0 there exist an 


automaton with P(e) < P* + €) was introduced. The detailed 


1We call the automaton irreducible (reducible) if the 
result Markov chain formed by sequence {V_} under both 
hypotheses is irreducible (reducible). 


is 





description of this class of automata is as follows: Let 
X10X55--- Be a se€quemce of independent, identically dis- 
tributed Bernoulli random variables where Xs can take on 


value H or T, let the two hypotheses be 


H : P(X; =H) = Ph VS. / aoe P(X,=T) = Py: 


where the prior probabilities are Tc ale 4 and q; = l-p, 
more) = h,t. ) 

Without loss of ponereltitcy, te may assume that Py > Pre? 
in which case, 2 = Py/P,> & = 4/4, and y = R/R = Pye /P +4: 
EeeGnie oN yPOLIICSses are Symmetric, tat 1s 11 Rina 2 Py» 
then y = (OMe bemee P(e) ee 11 + (P,,/4,,)" *) for an 
irreducible m-state automaton, and P(e) > 1fl (yeas) 
for a reducible m-state automaton. 


Let the transition function f be defined as follows (see 


Freure I); 
7 Fal if X = H 
SC Sle 
Te Ih if Xx =T 
HORPIe= 2,55 See 
Z with probability 6 > 0 1f X =H, 
ak RO 
1 otherwise, 
m-1 with probability ké > 0 if X =T 
£(m,X)= 


m otherwise, 


where k = 1 for the case of symmetric hypotheses. 
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wa 


SH H : 
1 2 = acne i . 
T T T k8T 


~« a Figure lie 

The transitions are made to the adjacent states only 
when the event X = H or X = T are observed. Otherwise, the 
State does not change. This automaton will reach an end 
State only on strong evidence to support the corresponding 
hypotheses (i.e., to enter state m, preference on # is 
strongly and vice versa for 7). However, leaving the end 
states has only a very small chance when 6 is small. A 
decis@onemadesat the end states results. in the smallest 
premeability of error and as 6 > Q, the error probability 
P(e) should asymptotically approach P¥*, 

The Hellman-Cover algorithm is useful in providing 
sequences of decisions, but not very suitable for the case 
MiletmoOmlyead Single decision 1S required. The irreducible 
automaton will asymptotically approach the lower bound for 
P(e) after a'large enough" number of observations and there 
1s no way to tell how large the number should be. In case 
when we need a more reliable decision we may have to wait 
for a long time. In addition, the automaton requires arti- 
metal randomization for generating the. probabilities 6 and 
komt© thaitsit out of the end States. Such a random generator 
needs additional memory to be added to the automaton and it 
will no longer be a finite state automaton if we need a very 


small 6 as close to zero. 
17 





On the same problem of the Bernoulli case, viewed as 
two-ways Bernoulli classification problem, Shubert had 
introduced in his paper [8] a deterministic machine which 
can perform as well as optimal randomized machines only if 
the machine memory is increased by less than one bit. This 
class of algorithm use the data source itself to provide: 
the necessary randomization. The problem of finding a 
truly deterministic algorithm has been considered by Ander- 
son in his thesis [5]. He developed a special class of 
symmetric (2n+3)-state algorithms with two absorbing 
States. The algorithm can perform a decision process with- 
out randomization. 


He defined the transition function f as follows (see 


Pabewre 2 ) : 
Hes) = stig f(s,T) = s-o(s) ety se Nl a2 oc eel 
f(s,T) = s-l , f(s,H) = stp(s) ipsa oo 2. eee 
f(s,H) = 1 P £(s,1) =@-1 if s = 0 
f(s,H) = s : f(s,T) = s deieeS =) 2 eHer ln) 


Wiles, el N Es P70) Me Se 


This algorithm, however, has higher probability of 
error than the randomized machine. But it was shown in [5] 
that, if the randomization is provided for this determinis- 
tic machine, the Hellman-Cover lower bound for P(e) can be 
approached arbitrarily closely. | 

In this thesis, another class of the deterministic 
algorithm is introduced and investigated. The class is of 


ergodic type and it is the only characteristic in which it 
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H tt 
-n 
i 
sun 1) 
d=T 
pean eee ene ieorithm £ = <p(l), ... , p(n) >. 
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differs from the one discussed by Anderson [5]. Since there 
were a lot of similarities in this paper and Anderson's 
paper, the sections concerning the proof of achieving the 
bound with randomization and deriving the asymptotic bound 
mor P(e) are omitted. However, the determination of the 
probability Gf E€rror in Lérms of the algorithm is presented 
in the complete form. The search of optimal algorithms for 
the cases of small De of states have been done by alge- 
braic computation in some cases and by computer search for 
the other. The results are summarized in Table I and Table 
Pav chie roles Ouandsy/) to provide Clearer idea of the 


trend for larger number of states. 
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II. DESCRIPTION OF THE ALGORITHM 


- Let XX, X25. 


distributed Bernoulli random variables where 


be a sequence of independent identically 


X,€G = Ce) Nee Heor- | mero recall) 1 =2).2 75... 


Consider the simple hypothesis testing problem 


lees P(X; = H) = p vs. / ial P(X; = H) =q, 
where 
xs Pp o leand *q = 1 - p. 


Denote the prior probability of H being true as 7, and 


h 
denote Ty the prior probability of 7 being true. Here only 
the case Tm = 1, = 14, the symmetric hypotheses testing prob- 


lem will be considered. The sequence’ of random variables 

X1»X5,Xz5--. can be viewed as successive tosses of a coin 

which is biased towards heads under hypothesis # or biased 
towards tails under hypothesis f. 

Famine (5) eiebe a Sequence, oO, Positive 
Ie tememesuch eiawmimecr r(1) < 1 for i = 2,35,...,n; where n 
1S any positive integer. 

With each r we associate a finite-memory symmetric 
algorithm (M,f,d) (See Figure 3), where M is defined to be 
accor sStatesesuch that M = {(1,h),(2,h),...,(n,h),(n,t), 

ite tlise lhe subseripts h anew t indicate tie iSrares 
in which the decision in favor of the hypothesis # and f 
respectively is made. In other words, the decision rule d 


Pomc) = Heand d{{i,t)} = £ for i = 1,2,...,n. 


Zi 





The transition function, f is defined by 


£{(i,h) ,H} 


feel coats Jig icy 


19 Ge) 0 ial be 


fee, te H} 


feet eek} 
Oa ea 
PCy May LF 
Eq GG pied) pale 


(r(i),h) 


Lit 3. See Anes ee 


Sesh) sis) 


eye) 


feed =) 12. 


= (rl), t) 
= (1,h) 
= (1,t) 
= (n,t) 


= (n,h) 


Lette neCmSeCOuCNceCn Ny ee OL LNGependent identically 


i a Aer id 


distributed random variables as the input, the states of the 


algorithm (M,f,d) form an ergodic Markov chain. The transi- 


tion probabilities are 


P{(i,h) -> 
P{(i,t) -> 
P{(i,h) -> 
P{(i,t) -> 
PiCleh) => 
P{(1,t) -> 
P{(n,h) -> 
P{(n,t) -> 


Gey =p 
iE?) 3; 
CatGD) i) ied 
(it1l,h)} =q 
ee a= 1,2, 
Gel a) Se 
(1,h)} =p 
(1,t)} mene 
(n,t)} = q 
(n ,h) } =a 


under the hypothesis #. Under the hypothesis 


2 


T, the transition 








H 
(n,h) 
iE H 
hc) 
T 4 





eon ice oor thie = < 1(2) 5603) 5...,r(n) >; where 
er (2). 3) is the state that the transition transits from 
(TAs oh ee 
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probabilities have the same form as above with p and q are 
interchanged in places. 

From now on, the specific form of the algorithm (M,f,d) 
will be denoted by fo Se > = 8 aie Coe. Gm) > 
Figure 4 shows the algorithm and the transition matrix of 


the case where n = 4 and f, — ee. Ds 


= 





Transition Diagram for fy =< 1,153" 


(1,h) (2,h) (3,h) (4,h) (4,t) (3,t) (2,t) (1,t) 


(1,h) p q 0 0 0 0 0 0 
(2 ,h) p 0 q 0 0 0 0 0 
sia ) p 0 0 q 0 0 0 0 
(4 ,h) 0 0 p 0 G 0 0 0 
(eat) 0 0 0 p 0 q 0 0 
(St) 0 0 0 0 p 0 0 q 
(Z,t) 0 0 0 0 0 p 0 q 
lc) 0 0 0 0 0 0 Dp q 
Figure 4, Illustration of Algorithm Diagram and the Transition 
Matrix for the Case of n = 4 and £,(r(2),7(3),r(4)) 
SS Woe 
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IIIT. DETERMINATION OF ERROR PROBABILITY 


Let Sh = igi) SG ships, (n,h)t, tet Se (cota 1, t) , 
---,(2,t),(1,t)}} be the two subsets of the set of states M. 
Define U(s}, se S}U S, Dewutsestationdryepropabilities and 


he t 
= = 2 
u(S,) = .Es w(s) and u(S,) = 25° uCs) 
be stationary probabilities of the state being in Sy and S, 


mermeeGtavely,ewwith the decision rule d, the probability of 


error can be written as 


It 


P(d = H|T) 


P(e) tm, P(d = Tiny 1 


t 


= 1 P(seS, |4) + 4g P(seS, |T). 


By the symmetry of the algorithm we have 


iecm|o =) P(seon| 2) that 1s 


P(e) P(seS, |) 


: P(seS, |#) 


a P(seS, ]#) + P(seS, |Z) 
P(seS, |#) ae H(S)) 3 
iL = 1+ —e 
{ PC SES || { H(S,) 


DiImoLeci etOnOb tdi Cxplicit expression for the probability 


PewcrrOl iisterm of the algorithm fs. Weamow prove the fLol- 
Howlin proprosition. 
Paoprosttuon .: Let f Smiter 5S) ....,r(n) > be given. 


Then, 


ZS 





u(S))  [P ae 
ney Vay Bo’ 


n 
where A, and BL are polynomials in q and p respectively 


satisfying the recurrence relations: 


gm n n—® - 

Ayer 8 PE rong MA AQ Ca) 
ean Nn n-& 7 

cael PO ee) Bop > ay > 4 (D ) 


Henee, both a and BD have integral coefficient and are of 


degree less than n. 


Enoor or Proprosition 1 


Let P be the transition matrix for the chain fo? where 
mirstamerows and columns correspond to states (1,h),(2,h), 
.,(n,h) and the following n rows and columns to states 
ieee ie t),...,(€1,t). 
Pe omemGleinjs.-.,u(n,h),u(n,t),.-.,u(1,t)) be the 
Stationary distribution, so that 
UC Oe (1) 
Pere wleisetne identity matrix. Partition the matrix P into 


four submatrices in the form 


Ph Qs, 


where PL 1S an nxn matrix of the form 
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TOW. 


Pp q 0 1 
p 0 q All O's 0 Z 
P 0 q 0 r (1) 
P. = 

a Pp 0 gq 0 ol 

q 
p 0 n 

column: 2 r(i) 2 n 


and P. Seal mx Maerix O21 ene fLormm 


row: 

0 q n 

p 

0 p 0 q 1 
Pe i Pp p 0 q a (e4s,) 

0 Do te] Z 

0 oy exe | dk 
column: n ieee) 2° I 


Seon vO TcMemocducieccun, these 2 matrices will have 
Poeb citricsmlocated im the same form except that they are 


Oo rrewcO Cac Other in both position and p,q values. 


Ze 





Notice that each row of these matrices contains exactly 
one entry q namely the (Guay) one, and that the label- 
ling of rows and columns of P. begins at the lower right 
corner while the labelling of PL begins from the upper left 
corner. The off-diagonal matrices Qh and Q, consist of all 
Zemoseexcept tor the lower Pett corner of Qh which is q, 
and the upper right corner entry of Q,> which is p. 

With this partition, equation (1) decomposes in to two 


equations 


fa( = Py) = (nO ee, 0-uKn tp) (2) 


al @ con 2 Je C1 Gry: SOREL (3) 


where 


HW, = GC1,h),.-.-,u(m,h)) 


H, = (@,t),...,u(1,t)). 
From (2) we have i, = (0,0,...,0,u(n,t)p) (I - P,)*. 
Let 
Se (1,5) 22 entry of (I - > Nis 
GE a Gals te jl jexe! Peel 2 e545 Is (4) 


n,1 g 
Using the formula for matrix inversion we get 
_ © > Py, i) 
wes = 
1) ae PL 
where [I - P| is the determinant of (I - P,) and (1 - Bee (pass 


is the Gini) COmactoreOor (l - P,) matrix. Hence 
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u(S,) = 2, wGi,h) = ee dy = a ny 


Let NS be the determinant of the nxn matrix obtained 
From (1 - Pi) after replacing the aol column of (I - Pi) 


matrix by a column of I's, 


q -q 1 
= 0. 1 -q ii 
ye 
on = Levi 1 (6) 
= 16. a =a) 
-p “4 


Expanding AL along this last column we obtain 


n 
A = 5, 


n~ a2 > FH ci ny? 


Since the (i njte cofactors of A, and wiCL- + See are tdentwed | 


thus u(S)) in (5) can be rewritten as 


u(S,) = aa ae 


Since the only transition between Sy and Sy is through 
state (n,h) and (n,t) we must have 
wtm,t)p = u(n,h)q (7) 


in the stationary regime. 
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aes j 
~~ ores 
2({_— es ee 

»>«€ 


Using the identity in (4) we get 


| (lg PE) 
u(n,h) = H(t) Ppp 


and substituting in (7) we get 


|= P| a Coe Pi) (n,n)° 


Consider (I - eh (n,n) which has the form 


q -q 
-p 1 es: (| 
(I a Sen ay = -p 1 “q 
3°) 1 -q 
-q 
=) . 1 


which is the same as the determinant of the (n-1)x(n-1) ma- 
trix I - PL Cbtained for the chain eq Putting temporarily 
Micmsipensermina (mj for the number of States in Sy we have 


a recurrence relation 


pace : pin) | = yar) . pine?) | 
Since 

[pe ? = pe] == q 
we obtain 

OD. OI] wal 
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Thus 


> 


u(S,) = — u(n,t)p. 


2 


Going Dack to (3) and repeating all the steps found 
above we can end up with the similar expression 


| B 
H(S,) = — u(n,h)q 


AS, 


where BL is the determinant of order n 


1 -q 
=0) 
1 = Deo uk 
B = 
nh 
2) Ex | 

a - 

1 -p -p 


BL is the determinant of the matrix (I - Pd after we 
meplace 1ts nes column (in the most left hand side) by a 


Covlnneot 1's. 


Hence, using (7) again, we have 


mS) | gm Ay 
U St a Bo 


and it left to prove that ie and Bo have the form as stated 


ftmealrand (b). 


cal 








Consider A, first. To evaluate this determinant let 
I, = mie 205.644: r(i1) = I}. 


Notice that I, is the set of exactly those rows indices i 
for which the (1,1) entries iio ane., =p. Mmuultiply the 
first row in (6) by p/q and add it to all rows such that 
iel,. This operation does not change the determinant of 


the matrix. So we have 


nN 
pe tT 
0 qe 
=i diye U8] 
A = 
n 
1 -q 
0 
-p 1 a: | 
; (n) 
0 Con 


The entries in the last column are given by 


ee eT 
q Ik 
a) 
Zl 
1 Tae if], 
Expanding this determinant along the first column we 


have 
p{n) = gD wm) 


where 


pr) = AY is the original determinant (6) and 


Sz 





2 (n) 
q q C55 


tn 


This determinant is of order n-l. Notice that the en- 
tries in the first column in the determinant p{) Be 
OmeyetOr row indices 1 = 3,...,nm such that either r(i) = 1 
or r(i) = 2. Hence, letting I, = (P= 5A ee Ie re 
multiplying the first row in pin) by p/q and adding to rows 


with iel, this determinant becomes 


q -q tin) 
0 q -q 
Leta 
, ~P 1 -q 


The entries in the last column are 


os 





(n) q (n) , ss 
ts; + D too af iel, 
(n)_ 
ae 


(n) oe 
| to4 if igl, 


Expanding again along the first column we have 


pin) . gp i") 


where 
(n) 
q q tz3 
Der =q 
(n) 
5 

-p i | 
(n) 
Con 


Proceeding in this fashion we obtain a sequence of de- 


p{nJ | Oe ae (8) 
n) 


1 


terminant ; 


where determinate pin) 1s or Order n-ktl. The entries tf 


in the last column of p(n) Saticty athe recurrence relation 


tin) 4 A tpt 18 hel, 
ae m (9) 
tia if igl, : 
Soe) oe = KG Kk+1,...,n3; and ” =— i> where 
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I, St leomMemekh2s...,n: r(i) < k}. 
Further 
of} = ao”, 
so that 
p(n) = gh tp) = eae : (10) 


Consider the determinant ferd Of onmeer n+) obtained by. 


Heung the Same sequence 7, 


ca 4 1 
-p eq 1 
esl 7 “P eed 
-p 1 -q 1 
-p i iegik 
30 nl 


Applying the above procedure to aie = Ave] we obtain 
a sequence 


+] (n+1 
Dm ae: Dae | (11) 


where the determinants pint?) again satisfy (9) with n re- 
placed by n+l. Arrange now the last columns of the sequences 


(8) and (11) into triangular arrays as follows: 
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tiep ty 
7 (n) a 


(n) (n) (n) 
tine tones oth ay 


(n+1) 
ciel 


(ntl) . (nt1) 
cine oe 


b 


7 (n+l) 


nd) 


‘ Golaetl , (ntl) 
n 
3 


i . e e * 
75) 0 and fae ra 9 


t 


feullgae) (n+1) 


fret Tees Wet cat1,nt 


IS 9G col 


fae) ee 


7 nei 


ie 


(2) ger 


Popenceonly it ie the first n rows of T™) ang T™*V 


Since for i < n by the definition of sets I 


are adentical, i.e., 


ten) ee ee = 1,2... atl) 


k,1 : 


b 


Nextapy (9) 


1 


_(n+1) 
Pree 
k-1 Gaal) 


Je 
1 q ger (nt1) Ph (3 Tie eee (Cate Leas ae need 
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In particular for k = ntl since r(mtl) < ntl 


(n+1) _ eo 4 (n+1) 
th+iyntl wes q g2r(n+1) To ; (13) 
But by (12) for & < n+l, we have 
Gat 2a Beeb) 
Sp A ae ia A 
Hence, from (13) we have 
Gig) 2 pF (2) 
nti,nt1 ~ 1 * G e&érc(n+1) 2,2 
Hence, by (10) 
(n+1) (2) 
ee i F (n+1) —_ 
n =r(n - 
q * q 
Since pir) = A_, we have 
n 
n 
neg n-2% _ 
eT =q +p ger (n+1) Ayq Peete Ze axes mee. ) 


where A, = 1. 


ime recurrencesrelation for BO is established in exactly 
the same fashion. Notice that the difference of determinants 
A, and BL is only p and q are in reverse position, thus on 


the same sequence of r we have 


= ey n n-2% 2 
ey =p +q v2) Bop eli A 2. cou elo) 


From (14) and (15) we had completed the proof of propro- 


Sitalon. I. 
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IV. DETERMINATION OF OPTIMAL ALGORITHM FOR SMALL n 


Since the error probability is 


+4 {, u(S, ) \ 
e) = a: 
u(S,) 
in order to minimize P(e) we need to maximize the ratio 


u(S,)/u(S,). 


From the last section, we have 


u(S, ) _ fp n A, 

WS) s (§) Bo 
where 

5 < p < 1 and qe=i1_—-- p. 
Thus the ratio u(S,)/u(S,) iS Maximized if A, /3, is maxi- 
mized. The ratios ABE depend on the sequence ie We) = 
<me2 Wer(3),...,r(m) >. Aftenmewemhave determined all pos- 
sible algorithms ten (2) (there are (n-1)! possible algorithms 
in total) and computed the corresponding value of A] Be we 
can search for the maximum A/BL: This way we can identify 
the optimal algorithm fia) and obtain the maximum value of 
u(S,)/u(S,)- 

In what follows we carry this program for the case of 
small n. For the case of n = 1, we have P(e) = q which means 
that the decision made without any memory can be done with 
Premeanpltlity of error equals to q. 

Ratios BL Tee were calculated algebraically for n = 2, 3, 


4 and 5. For the value of pe(4,1) it was found that 
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A * 
(i) = ] for any f 
2 
A * 
(3) = 1 for any f 
3 
A,\* 3 : 
r) = 4-2 HORE Se Inns % 
4 p +q 
A.\* 3 2 2 
(=| = to Pb, for ie Sle 24> 
=) p + pq’ + q 


The algebraical work to determine these values is con- 
tained in Appendix A. 

Heme ailcom On ne anomeonto 105 several values of p > 35 
were chosen. The search for the optimal algorithm was per- 
formed by using an IBM 360/67 a double precision. The 
program found in Appendix B was becoming too time consuming 
Tice x CccCodaml), Forstihe values ofep in the vicinity of 
1, value of p at .99 and .999 were used. However, in the 
vicinity of &, the optimal algorithm was determined by a 
Taylor series expansion around 4% + €, and neglecting terms 
piano miavince power @nedater than or equal to Z. The deri- 


vation of this expansion was as follows: 


At (5) 


RG) =A: Gee GAG cAtA(s) + a easier 





where R is the remainder term. 
Ignoring the terms which have e” where n ie ace t 
A (% -£) = ACs) - cA Os)” Similarly B(% +e) = ACs) + 


eAt (2). Consider 


ae 





A, Ca) 7 Aa = cAt 


250) A. * Al 


n 
A‘ 
n 
1l-e — 
mS Ay 
= ™ 
tee wa 
n 


To maximize X, we minimize A'‘/A.. Differentiating A, with 


respect to q we get 


k 


k 
: - k-1 k- 
Ay 4769) Se | ger (k+1) q 2» 


-2-1) 


wna ey re 


which is recursive in Ay and Ay; where q = 


wv 


The program for the search of the ratio oe is at 
Appendix C. Notice that this approximation of p value yield 
the same result in term of optimal algorithm as in the case 
when p = .51 shown in Table I and II. 

These results are summarized in Table I, Table JI and 
Peeirewono and 7, lable il presents the results of Table 
YT in the form used in [5] by Anderson. The result was not 
so much different although, the algorithm presented here has 
no stopping rule. After some amount of observation were 
mecetveastne decisions are made with the same probability of 
error as in [5]. Anderson in his paper made a point that 
PMicmiimtee=Memony algorithm Seems to operate in a similar 
fashion as a human decision making process. The result of 
this paper seems to support his observation. The mechanism 


of remembering and forgetting seems to resemble somewhat the 
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procedure of making decision under finite memory constraint. 
The fact that man can gain experience and learn during his 
entire life span, this makes one to believe that there must 
be some kind of data summarized mechanism such that he can 
make a decision with some degree of confidence at a given 
moment. The word experience may be one of the key which 
can lead to further study about the mechanism of data sum- 
Marization. The algorithm proposed in this paper can be 
viewed as a data summarized procedure under simple hypotheses 
with Bernoulli observations. The more complicated machine 
can be developed by considering the general case of multi- 
ple hypotheses testing under finite state memory. The idea 


was discussed in [7] by Yakowitz and in [9] by Salagowicz. 
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Natural logarithm of max (a7 Ee) 
©.) 


Faire 6). 





4 5 6 7 § 2 ae ial LZ 


Graph of &n max (A, /BL) vs. n values (p from 
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a" 


Natural logarithm of max (Ay/Bn) 





Ruevies,. Ghaph of £n max (A, /8,) Mooi woes (np from ./5-..99)) 
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APPENDIX A 


Determination of (A, /8,)* and £* algebraically for 


3, 4, and S. Recall that we have 
_ ek k k-2 

eget ape 
_ _k k k-2 

Bin Pt aigdescs 1 TBP 


The case of n = 2, we have the unique algorithm f = 


<1> 


because the transition from state 2 to state 1 can be done 


only by the jump of 1 or r(2) = 
above we have 
A, = q+ p(Aja°) =qtps 
B,=p+aq(Byp)=prtqs 
A, + 
BR. -, @ 
fi 


Bie weasc tome — 3, we 


pavestigate, namely, 


+ <%,1> 
anid 
£ 2 i 
ror 
f,= < 1,1 >; 
A, = q° + p(A.q + AnQ°} 
z= q° + p(A,q + Ang 

A, = 1 
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i 


1 


1 


have two possible algorithms 


According to the formula 


lI 
jd 


where Ay 


I 
jd 


where By 


To 





2 
Az, = q + pq + p 


q7*p 
= 1. 
By the same derivation, 


2 
Bz =p + pqa+q 


p*q 
il 


Qu, 
ale 
WN {DN 
“ee 
It 
fae 


For 
ee Sys 
Z o 
am 0 MoS 
2 
= (Ogee 
and 
Be = po +q 
But 
2 - # — a 
et oC = eC) Soe taap PO -epe- t= Q < I 


Poem te mtilemSunpchSeripts at tie ratio are for numbering 
purposes. 
Pus tOmmunemGase Tl —- 5, thesoptinal algorithm can be 


* 
either {. east. i) > Or fo =< (ieee 
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Before we proceed the proof for the case n = 4 and 5, we 
note that the expression for AL and BH deeter only in p and 
q being interchanged. For this reason, the expressions for 
BO will be presented without computation. 

Determination of ees and ee algebraically for n = 4 
and n = 5. Recall that 


alk k k-2 
Acad = VG * Pedr pea) Ag 
k 

Eee Kee 
Brey = PF Apacer) PoP 


For the case when n = 4, the possible forms of algorithm 


are (n - 1) ! = 3 ! = 6, which are as follows 
f) EE 112 >, f = < 1,1,3 > 
¥ SIPPING =< 1,2,2 >5.f7 =< 1,2,3 >. 


The optimal algorithm among these combinations is fi = 


5 


3 4 is ; 
fi and (A,/B,)* = (A,/8B,) =(q +py¥p +q) where Sipe o ci toh 


labels the combination order. 


Proof 
For ns 
A, =q> + (A eg A ) 
| ais 2% 3 
A, = oar ied + A) 
3° 4 Ee 2 
ao es 
Ay = ] 
A, = 1] 
and 
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A,=q +pqa+p=qtp-l 
hence | 
Mi = q> + pq* + pq + p 
= q? (q + p) + pq + p 


2 
= GUD De Sa 


The same result can be obtained from computing B 


(: ) 
ee 
° BR, poe ° 
4 


It was proved that the case when n = 2, the transition 


4 


always goes to 1. So the value of A, always equals to 1 so 


as Bo. 
For £4 =< 1,1,2 >; 
A, = q+ piajgd st AS) 
4 2 3 
2 
Az Se p (A, B Ay) 
where 
A, = A, = 1 
= Af E 
= Oly es ec] ee ell 
3 


A, = Gq + pq + Pp 
ieee MPG a 
2 
(=!) Poor r 
SS i a 
De eee 
= < 1,1,3 >; 


ms. 
Ay = Qo + pA, 
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hence 


since 


Seige >; 


3 2 
qé + p(A,a 


eines; 


3 
q’ + p(A,a 


+ A54 + Az) 


+ Az) 


=| 





A, = q+ p 

3 2 

A, = q + pq’ + p 
go 2 pe 


Consider 1 > p where p > 4. 


2 
p> p 
1>p*-peti 
2 2 2 
LD or GIT OS os jekel 
2 2 3 3 
Oh 2 Cag ea) 10s ae oho aa ol s)he) iene 
3 3 
Ti Pa ape aed 
3 4 
Dieta 


or (A,/B,)” > 1 and Cag ere (A,/B,)> and (A,/B,)° are 
dominated. 

Stated without a proof is the following simple fact; 
Let Aa) oes Oummasc. > 0) 
Tae a/b ele then say be > a tC / be tcc. 


This result will be applied when needed without notice. 


Sy 





Obviously, 


(ss) > (8) 


since 
q° +p. a> + 2pq* + p 
3 3 2 
p +q Dp” wecpeq + q 


where 


2 2 
2pq’ < 2p q. 
Thus for n = 4, the optimal algorithm is fi =< 1, ),30ee 
For the case of n = 5, the possible forms of algorithms 


he— “erie taje (4), r(S) >eare as follows: 


ft 5 2 aioe fe ee? S 

ue AO. 
fe Saeco. 5 |S f. =o 1,4 5 
foe a os 

5 = < eel > fo See 22 > 
rie g | 

5 = < eee o. > f, ee e244 > 
fee = 2 ilies ee ge 5 2 pee 
f." 2 oe fe = 2 ieee 
eo << aA IL fete, 2,1,2 > 
£.°= Zh eee fe 2 2 12a 
a eee > eee = <5) ae 
ee 2A ee ee = 20 Oe 
f-7= Zl 2 ee, 2 55,2 > 
fe>e 2 ee Buss = 2 ieee 
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The following algebraical proof reveals that the optimal 
algorithm is ft = fe = < 1,1,2,4 > and the maximum ratio 
A, /B. of this algorithm is (q° a 2p~q + rye + 2a" it a”) 
for all value of pe(,1). 
Proof. If it is simpler to break the possible f's into a 
group of 4 and identify the maximum ratio A. /B, among the 4 


and the result of these 6 groups will be compared later. 


oe = <9.) 1 >. trom the case n = 4, £, = < 1,1,1 > we 


4 
have 
eee ee 1) 1 A, = 1 
Men eriea. —kGe + Aca + A,) 
eS a eo 2 34" 4 
4 3 2 
=q + pq + pq” + pq + p 
= I 
= By 
>} 2] 
° Re al 
5 
goals tees 
5 9 > > > 
A. = q’ + p(Ajq? + Azq + A,) 
5° (4 ) 3 4 
4 2 
=d (epee pda p 
4 2 
Bo =p +pq+tpqatq 
2 
(3s) Do ee . 
i/o 2 
. p +pq+pq+q 
aces >. 
5 b b b 3 
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q 


5 


5 


5 


4 
Ag es p(Azq " A,) 
4 
=" Cheeodi + 
and 
B. = p' + + 
5 = i Pd gq 
3 
(#3) : a A 
a ~ | 
» p + pqt+q 
ere i). lee 
5 3 > 3 3 
4 
=) a a 
4 
=q +p 
B. = pe 
4 
(55) _qitp 
Be a ne 
Consider 2pq > 0 Voe(%,1) 
ietecpge > 1 
2 
etd) > apa 
1° > p? + q? = pé + pq? + 
2 
= jo) (je) ar 89 jae Ze 
3 
= p(p? + q) + q 
3 
=o cepa + 9d 
4 4 
ey) (eeerscam® Bin 
jl 
4 4 
|OMaer Soe) care 
4 4 
q + p> p ieied 
4 A 
Be ep & > 1 
p4 © (6) 5 





GY > G2) 


4 3 
qt +p > (q’ + p) + pq = > (33) 
4 B 
p +q (p' + q) + pq 5 25 


and 


and 


4 4 2 
1a > {q+ p) * pq’ + pa since pa- < p-a 
p +q (p+ q) + pq + pq 


(3) >) 


4 


; A 
; (2 1S maximum in this group of 4. 
5 


f. — << ieee thomecne case n =~ the alcorithn of 


the form f. = < 1,1,2 > we have A, = A, = Ax = 1 and Ay = 


5 
Gq * pq * fp. 


4 3 2 
A, = q + p(Aja + Aoq” + Azq + Ay) 


4 


3 ? 3 2 2 
=i si peed pdt pq e+ pq +p. 


After simplifying we get 


> 
iH 


c (p* + q)(pq + 1) 


oe) 
H 


c (q° + p)(pq + 1) 


A 2 
5 (q° + p)(pq + 1) 


edo Se 


> 
1 


4 2 
ane nD CAS: ACS MAY 
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4 Js 3 2 2 
qG * pq * pq + pq _ + pq? Pp 


= q’ + p’q +p 
B. = p’ + pq’ + q 
(‘s), Pa oe 
NBS pt + pqe tq” 
£/ = < 1,1,2,3 >; 


D> 
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4 
5 = 4 + plAzq * Ay) 


4 3 Z Z 
Gq *+ pq *+ pq + Pq? Pp 


tt 


3 2 
=r et pad + p 
3 2 
Be =p + pq’ + q 
and 
7 
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B_ 7 a ae : 
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fe = < 1,1,2,4 >; 
—— — qt yea) 
E P\\4 
4 
= Gl 0s pq> + pq + p? 
3 2 
Sao = soln oF mS | ea p* 
3 2 2 


Be =p + pq’ + q 


Pee Pa reg 


aoe Ya 
mak: ii 
p + 2pq’ +q 
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Among 5S, 6, 7 and 8, the ratio (A, /B.)® 1s the maximum. 


Consider 2pq > 0 


p(1 - p) sae - oi Se 0 


2 
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Z Z 
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2 


2 
(Ome) Cpe ed) Dap dg) > (pe qitp° + q~ + pq) 
2 2 2 2 3 3 
Pe Saed "tap q - pdr =p sq 
8 5 
A A 
3 2 y 3 2 2 S S 
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Si. 5 ae ee 
: aeanuee meee 
Bp ot pase tad 
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3 3 NS ° Ac P 
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5 S 
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(* _ ele pe 
BL ~ 3 Z 
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3 2 3 3 2 3 
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>) 5 (_2 
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fe = le ieee Or the case Nn = 4..thc. orn fi, = 
< 1,1,3 > we have A, = A, = 1, Az = 3 and Ay = q” +p. 
4 3 2 
A, = q + p(A,q™ + Ang” + Azq + Aj) 
4 3 2 3 2 
= cepa * pq~ * pq + pq + pe. 
After simplifying we get 
3 2 
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Be=pqtq +p 
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gD Ge 4 ee 
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. A 3 2 
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) eat Pa oc] 
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Sede hae 3 4 
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= ee 
E p q 
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=q + pq + pq + pq. +pqtp * pq * pq. 
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